Skip to content

Conversation

@zhangchiqing
Copy link
Member

@zhangchiqing zhangchiqing commented Oct 9, 2025

Working towards #7910

⚠️ This PR is not a full refactor of storage operations using the functor pattern; rather, it’s intended to start a discussion around the pattern itself. Please refer to the comments for specific discussion points.

@codecov-commenter
Copy link

codecov-commenter commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 39.05325% with 103 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
storage/operation/functor.go 40.51% 62 Missing and 7 partials ⚠️
storage/operation/guarantees.go 0.00% 21 Missing ⚠️
storage/operation/approvals.go 0.00% 8 Missing ⚠️
storage/operation/payload.go 0.00% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

}
deferredBlockPersist.AddNextOperation(func(lctx lockctx.Proof, blockID flow.Identifier, rw storage.ReaderBatchWriter) error {
return operation.IndexLatestSealAtBlock(lctx, rw.Writer(), blockID, latestSeal.ID())
return operation.IndexingLatestSealAtBlock(blockID, latestSeal.ID())(lctx, rw)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mentioned that any database operation requiring a lock context could be refactored using the functor pattern, but I think this case might be an exception—even after applying the refactor.

The functor isn’t particularly useful here since we don’t have the block ID until we start executing the deferred database operations.

I went ahead and refactored it anyway to illustrate my point, but in this case, it doesn’t provide any performance benefits over the original version and only adds unnecessary complexity.

Thoughts?

}

// make sure all payload guarantees are stored
for _, guarantee := range payload.Guarantees {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of storing each individual guarantee, we pass in all the guarantees to be stored together, so that instead of creating N functors, we create only one functor to insert and index N guarantees.

return fmt.Errorf("could not store guarantees: %w", err)
}

// make sure all payload seals are stored
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, I'm using guarantees as an example to apply the functor pattern and raise the discussion. I'd like to get feedbacks first and settle on the functor pattern before applying this to the storing operation of the rest of payloads , such as seals, results etc.

))
}

type CollectionGuaranteeWithID struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why creating this struct?

Because we would like to insert N guarantees with one functor, instead of N functors. And we need to bundle all the data before passing to the functor.

Why placing this struct here?

It might not be the best place, but at least it's close to where it is being used (InsertAndIndexGuarantees). I'm open for ideas for a better place for this struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we would like to insert N guarantees with one functor, instead of N functors.

Do we have any evidence that using N functors would have a non-negligible performance impact? I think it is likely to have an extremely small impact in practice, and don't think it is worth the complexity of optimizing for here.

Also, the need to pre-compute the guarantee IDs prior to passing to the functor is downstream of a lack of ID caching. General ID caching is a problem we will probably want to solve relatively soon, which would remove the need for this (we would just pass the Guarantee object around and call ID() as much as we want without per-call performance penalty). Again, unless there is evidence that we really need the optimization for this case, I suggest we opt for the simpler implementation.

return nil
}
errmsg := fmt.Sprintf("InsertAndIndexResultApproval failed with approvalID %v, chunkIndex %v, resultID %v",
approvalID, chunkIndex, resultID)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the err message, the functors like Overwriting and InsertingWIthMismatchCheck is too general that doesn't have the context. So I used WrapError to include more context.

storing := operation.InsertAndIndexResultApproval(approval)

return func(lctx lockctx.Proof) error {
if !lctx.HoldsLock(storage.LockIndexResultApproval) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant, because InsertAndIndexResultApproval will check the lock


// Upserting returns a functor, whose execution will append the given key-value-pair to the provided
// storage writer (typically a pending batch of database writes).
func Upserting(key []byte, val interface{}) func(storage.Writer) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced by functors

return WrapError("InsertAndIndexGuarantees failed", BindFunctors(
HoldingLock(storage.LockInsertBlock),
WrapError("insert guarantee failed", OverwritingMul(guaranteeIDKeys, guarantees)),
WrapError("index guarantee failed", InsertingMulWithMismatchCheck(collectionIDKeys, guaranteeIDs)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping error for inserting multiple records is a bit challenging, I can't find a cleaner way to provide context for each individual write. So ended up just wrapping the process.

// Caller must ensure guaranteeID equals to guarantee.ID()
// Caller must acquire the [storage.LockInsertBlock] lock
// It returns [storage.ErrDataMismatch] if a different guarantee is already indexed for the collection
func InsertAndIndexGuarantee(guaranteeID flow.Identifier, guarantee *flow.CollectionGuarantee) Functor {
Copy link
Member Author

@zhangchiqing zhangchiqing Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is unused, but only for reference for now.

I initially created this function by combining InsertGuarantee and IndexGuarantee, and refactored with the functor pattern. But later I decided to use InsertAndIndexGuarantees, see comments there for my motivation.

I kept this one here for reference as it's easier to understand, will clean up after the discussion of functor pattern is settled.

@zhangchiqing zhangchiqing marked this pull request as ready for review October 10, 2025 15:42
@zhangchiqing zhangchiqing requested a review from a team as a code owner October 10, 2025 15:42
Comment on lines 64 to 70
// Overwriting returns a functor that overwrites a key-value pair in the storage.
// The value is serialized using msgpack encoding. If the key already exists,
// the value will be overwritten without any checks.
//
// This is typically used for operations where we want to update existing data
// or where we don't care about potential conflicts.
func Overwriting(key []byte, val interface{}) Functor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Overwriting returns a functor that overwrites a key-value pair in the storage.
// The value is serialized using msgpack encoding. If the key already exists,
// the value will be overwritten without any checks.
//
// This is typically used for operations where we want to update existing data
// or where we don't care about potential conflicts.
func Overwriting(key []byte, val interface{}) Functor {
// Upserting returns a functor that overwrites a key-value pair in the storage.
// The value is serialized using msgpack encoding. If the key already exists,
// the value will be overwritten without any checks.
//
// This is typically used for operations where we want to update existing data
// or where we don't care about potential conflicts.
func Upserting(key []byte, val interface{}) Functor {

I prefer the term "upsert": it matches the non-functor function with the same semantics, and better captures that we are either inserting or updating (we are only overwriting if something has already been written).

//
// This is used for operations where we want to ensure uniqueness and prevent
// accidental overwrites of existing data.
func InsertingWithExistenceCheck(key []byte, val interface{}) Functor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think separating this out from lock proof checking is a good idea. This function will not behave as expected if the user does not include lock checking separately. Currently there is not documentation of this critical requirement of the user of this function. We could add documentation, but I think it would be better to logically couple lock proof checking with corresponding read/write checks.

I like the idea, but I thinking adding too many layers of indirection will make it harder to write and verify correct use of locks. I think it would be better to couple each of these generic lock-requiring utility functions with HoldingLock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lots of boilerplates, like the error to return, it allows us to work with higher abstraction.

We could also make a InsertingWithExistenceCheckAndLockCheck to bind with the lock checking, but IMO it's unnecessary.

So we are able to simplify the functor operation as:

func InsertAndIndexGuarantees(guaranteesWithID []*CollectionGuaranteeWithID) Functor {
    ...
	return BindFunctors(
		CheckHoldLockFunctor(storage.LockInsertBlock),
		UpsertMulFunctor(guaranteeIDKeys, guarantees),
		InsertFunctorMulWithMismatchCheck(collectionIDKeys, guaranteeIDs),
    )
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also make a InsertingWithExistenceCheckAndLockCheck to bind with the lock checking

I don't think we should have a separate function that also binds the lock check. I think InsertingWithExistenceCheck should remain as it is, except that it also does the lock check. The reason is that InsertingWithExistenceCheck must never be called on its own, it always must be called strictly after a lock check.

Currently these two things which must always happen together are separate in the API, and the fact that they must be manually called together every time for correct usage is not documented. I'm suggesting that we modify the API so that the only way to call InsertingWithExistenceCheck is to simultaneously define the required lock check.

Copy link
Member Author

@zhangchiqing zhangchiqing Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about keeping both:

func InsertAndIndexGuarantees(guaranteesWithID []*CollectionGuaranteeWithID) Functor {
    ...
	return BindFunctors(
+		// keep this lock check here to make the functor composition flexible
		CheckHoldLockFunctor(storage.LockInsertBlock), 
		UpsertMulFunctor(guaranteeIDKeys, guarantees),
-		InsertFunctorMulWithMismatchCheck(collectionIDKeys, guaranteeIDs),
+		InsertFunctorMulWithMismatchCheck(
+			storage.LockInsertBlock, // check if this lock is held before running the insert
+			collectionIDKeys, guaranteeIDs),
    )
}

//
// This is used for operations where we want to ensure data consistency and
// detect potential race conditions or conflicting updates.
func InsertingWithMismatchCheck(key []byte, val interface{}) Functor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same worry here about separating generic logic that must be lock-protected from the logic checking the lock, as mentioned above.

))
}

type CollectionGuaranteeWithID struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we would like to insert N guarantees with one functor, instead of N functors.

Do we have any evidence that using N functors would have a non-negligible performance impact? I think it is likely to have an extremely small impact in practice, and don't think it is worth the complexity of optimizing for here.

Also, the need to pre-compute the guarantee IDs prior to passing to the functor is downstream of a lack of ID caching. General ID caching is a problem we will probably want to solve relatively soon, which would remove the need for this (we would just pass the Guarantee object around and call ID() as much as we want without per-call performance penalty). Again, unless there is evidence that we really need the optimization for this case, I suggest we opt for the simpler implementation.

return UpsertByKey(w, MakePrefix(codeBlockIDToLatestSealID, blockID), sealID)
}

func IndexingLatestSealAtBlock(blockID flow.Identifier, sealID flow.Identifier) Functor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General suggestion on naming

I think explicitly using the Functor terminology in the name of functor-returning storage functions would be clearer than using the present tense.

  • Present tense vs imperative doesn't really communicate "this will be done in the future" to me
  • The "functor" version is usually named very similarly to the "non-functor" version (eg. Index vs Indexing) -- I feel it would be easy to gloss over and miss the distinction

I'm suggestion as a naming convention: $(name you would have used had it not been a functor) + "Functor"

Suggested change
func IndexingLatestSealAtBlock(blockID flow.Identifier, sealID flow.Identifier) Functor {
func IndexLatestSealAtBlockFunctor(blockID flow.Identifier, sealID flow.Identifier) Functor {

@github-actions
Copy link
Contributor

github-actions bot commented Oct 29, 2025

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants